52 research outputs found
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
How is a data-driven approach better than random choice in label space division for multi-label classification?
We propose using five data-driven community detection approaches from social
networks to partition the label space for the task of multi-label
classification as an alternative to random partitioning into equal subsets as
performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector,
infomap, walktrap and label propagation algorithms. We construct a label
co-occurence graph (both weighted an unweighted versions) based on training
data and perform community detection to partition the label set. We include
Binary Relevance and Label Powerset classification methods for comparison. We
use gini-index based Decision Trees as the base classifier. We compare educated
approaches to label space divisions against random baselines on 12 benchmark
data sets over five evaluation measures. We show that in almost all cases seven
educated guess approaches are more likely to outperform RAkELd than otherwise
in all measures, but Hamming Loss. We show that fastgreedy and walktrap
community detection methods on weighted label co-occurence graphs are 85-92%
more likely to yield better F1 scores than random partitioning. Infomap on the
unweighted label co-occurence graphs is on average 90% of the times better than
random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard
similarity. Weighted fastgreedy is better on average than RAkELd when it comes
to Hamming Loss
Graph Energies of Egocentric Networks and Their Correlation with Vertex Centrality Measures
Graph energy is the energy of the matrix representation of the graph, where
the energy of a matrix is the sum of singular values of the matrix. Depending
on the definition of a matrix, one can contemplate graph energy, Randi\'c
energy, Laplacian energy, distance energy, and many others. Although
theoretical properties of various graph energies have been investigated in the
past in the areas of mathematics, chemistry, physics, or graph theory, these
explorations have been limited to relatively small graphs representing chemical
compounds or theoretical graph classes with strictly defined properties. In
this paper we investigate the usefulness of the concept of graph energy in the
context of large, complex networks. We show that when graph energies are
applied to local egocentric networks, the values of these energies correlate
strongly with vertex centrality measures. In particular, for some generative
network models graph energies tend to correlate strongly with the betweenness
and the eigencentrality of vertices. As the exact computation of these
centrality measures is expensive and requires global processing of a network,
our research opens the possibility of devising efficient algorithms for the
estimation of these centrality measures based only on local information
- …